========================================================

Basic Data

Univariate Plots Section

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#that's a lot of receipts with a 0 amount. exclude the top 1% of contributions to zoom in a bit.
ggplotly(
  ggplot(data = pfc, aes(x = contb_receipt_amt)) +
    geom_histogram() +
    #limit x axis to omit top 1% of val
    xlim(0, quantile((pfc$contb_receipt_amt), 0.99))
)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 3032 rows containing non-finite values (stat_bin).
#are there zip codes more likely to contribute?
plot_ly(x = pfc$contbr_zip, data = pfc, type = "histogram")
#too specific.  are there cities more likely to contribute?
plot_ly(x = pfc$contbr_city, data = pfc, type = "histogram")
#I see you, Denver.  Then Boulder, where people can work in Denver.
#Are there occupations that are more likely to contribute?
plot_ly(x = pfc$contbr_occupation, data = pfc, type = "histogram")
## Warning: Ignoring 32 observations
#The largest groups are retirees. I did not expect retirees to be donating so actively.
#Which candidate received more donations? I've heard Colorado is fairly purple.
plot_ly(x = pfc$cand_nm, data = pfc, type = "histogram")
#Let's check that by party
plot_ly(x = pfc$cand_party, data = pfc, type = "histogram")

Univariate Analysis

##    cmte_id               cand_id                           cand_nm     
##  Length:164206      P00003392:70843   Clinton, Hillary Rodham  :70843  
##  Class :character   P60007168:53689   Sanders, Bernard         :53689  
##  Mode  :character   P80001571:13718   Trump, Donald J.         :13718  
##                     P60006111:12953   Cruz, Rafael Edward 'Ted':12953  
##                     P60005915: 7112   Carson, Benjamin S.      : 7112  
##                     P60006723: 2201   Rubio, Marco             : 2201  
##                     (Other)  : 3690   (Other)                  : 3690  
##                contbr_nm                contbr_city    contbr_st  
##  LENELL, MATT       :   390   DENVER          :32215   CO:164206  
##  IMMASCHE, SONIA    :   356   BOULDER         :14936              
##  SMITH, PHILIP      :   320   COLORADO SPRINGS:11039              
##  CASPERSON, CAROLINA:   304   FORT COLLINS    : 7494              
##  RAMSEY, WILLIAM    :   261   AURORA          : 6109              
##  HOFFMAN, TONI      :   240   LITTLETON       : 5461              
##  (Other)            :162335   (Other)         :86952              
##      contbr_zip          contbr_employer              contbr_occupation
##  804212057:   390   N/A          :23483   RETIRED              :38933  
##  805241517:   356   RETIRED      :20878   NOT EMPLOYED         :16446  
##  802212506:   322   SELF-EMPLOYED:12233   INFORMATION REQUESTED: 3861  
##  802034502:   304   NONE         :12047   ATTORNEY             : 3229  
##  814015616:   300   NOT EMPLOYED : 6761   TEACHER              : 2536  
##  80504    :   245   (Other)      :88626   (Other)              :99169  
##  (Other)  :162289   NA's         :  178   NA's                 :   32  
##  contb_receipt_amt  contb_receipt_dt    
##  Min.   :-16300.0   Min.   :2014-08-16  
##  1st Qu.:    15.0   1st Qu.:2016-02-28  
##  Median :    27.0   Median :2016-05-06  
##  Mean   :   102.6   Mean   :2016-05-11  
##  3rd Qu.:    75.0   3rd Qu.:2016-08-18  
##  Max.   : 18000.0   Max.   :2016-11-27  
##                                         
##                            receipt_desc    memo_cd   
##                                  :162356    :139603  
##  Refund                          :   883   X: 24603  
##  REDESIGNATION FROM PRIMARY      :   232             
##  REDESIGNATION TO GENERAL        :   225             
##  REDESIGNATION TO CRUZ FOR SENATE:   144             
##  SEE REDESIGNATION               :   116             
##  (Other)                         :   250             
##                                memo_text      form_tp      
##                                     :98139   SA17A:139760  
##  * EARMARKED CONTRIBUTION: SEE BELOW:52328   SA18 : 23563  
##  * HILLARY VICTORY FUND             :12312   SB28A:   883  
##  REDESIGNATION FROM PRIMARY         :  232                 
##  REDESIGNATION TO GENERAL           :  225                 
##  EARMARKED FROM MAKE DC LISTEN      :  163                 
##  (Other)                            :  807                 
##     file_num            tran_id       election_tp        zip           
##  Min.   :1003942   C10182166:     2        :   400   Length:164206     
##  1st Qu.:1077648   C10233294:     2   G2016: 52017   Class :character  
##  Median :1093618   C10233584:     2   P2016:111788   Mode  :character  
##  Mean   :1095013   C10248517:     2   P2020:     1                     
##  3rd Qu.:1119042   C10258303:     2                                    
##  Max.   :1134173   C10262748:     2                                    
##                    (Other)  :164194                                    
##    ZCTA_USE               cand_party     retiree_status    
##  Length:164206      democrat   :124675   Length:164206     
##  Class :character   republican : 38751   Class :character  
##  Mode  :character   libertarian:   502   Mode  :character  
##                     green      :   219                     
##                     independent:    59                     
##                                                            
## 
#some of these people were contributing early and often.  I'll compare those to the 2015-2016 contribution limits here later: http://www.fec.gov/info/contriblimitschart1516.pdf

summary(pfc$contb_receipt_amt)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -16300.0     15.0     27.0    102.6     75.0  18000.0
#the histogram on contributions had a large group around zero -- are there a large number of zero contributions?
length(which(pfc$contb_receipt_amt == 0))
## [1] 5
summary(pfc$contb_receipt_dt)
##         Min.      1st Qu.       Median         Mean      3rd Qu. 
## "2014-08-16" "2016-02-28" "2016-05-06" "2016-05-11" "2016-08-18" 
##         Max. 
## "2016-11-27"
pfc %>% 
  group_by(memo_text) %>% 
  dplyr::summarise(n = n()) %>% 
  arrange(desc(n))
## # A tibble: 95 × 2
##                                  memo_text     n
##                                     <fctr> <int>
## 1                                          98139
## 2      * EARMARKED CONTRIBUTION: SEE BELOW 52328
## 3                   * HILLARY VICTORY FUND 12312
## 4               REDESIGNATION FROM PRIMARY   232
## 5                 REDESIGNATION TO GENERAL   225
## 6            EARMARKED FROM MAKE DC LISTEN   163
## 7         REDESIGNATION TO CRUZ FOR SENATE   144
## 8                     *BEST EFFORTS UPDATE   135
## 9                        SEE REDESIGNATION   116
## 10 REATTRIBUTION / REDESIGNATION REQUESTED    69
## # ... with 85 more rows

What is the structure of your dataset?

normalized, there’s one entry per donation (or refund)

What is/are the main feature(s) of interest in your dataset?

We could roll up to see how much each candidate received as a whole, or how much specific earmarks received, and drill down to see if someone in the state went over a contribution limit.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

Each candidate is ID’d, and eafch contributor is specifically named, along with the date of hteir contribution and the amount of their contribution. Any memos with specific earmarks or refund memos are designated clearly.

Did you create any new variables from existing variables in the dataset?

No.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

I transformed the date into date format to make summary analysis simpler on that variable. There were a lot of donations that seemed centered around zero. I did exclude the top 1% while looking at these so I could see the data a bit closer, there are only 5 actual zero contributions. The others in the near-zero range appear to be micro-transactions and not actually zero.

Bivariate Plots Section

## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous x-axis
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1395 rows containing non-finite values (stat_bin).
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 414 rows containing missing values (geom_bar).

#roll that up by party, keep log scale as total coming into various parties has lg difference
p <- ggplot(aes(x = contb_receipt_amt), data = subset(pfc, !is.na(contb_receipt_amt) & !is.na(cand_party))) +
  geom_histogram() +
  scale_x_log10() +
  scale_y_log10() +
  facet_wrap( ~ cand_party)

#run through plotly to clean up vis a bit & add mouseovers
ggplotly(p)
## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous x-axis
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1395 rows containing non-finite values (stat_bin).
## Warning: Transformation introduced infinite values in continuous y-axis

## Warning: Removed 3032 rows containing non-finite values (stat_bin).
## Warning: Removed 15 rows containing missing values (geom_path).

## Warning: Removed 78 rows containing missing values (geom_point).

pfc %>% 
  #mutate to add a column for "EARMARK" in memo vs not
  mutate(earmarked = if_else(
    grepl("EARMARK", memo_text), "earmarked", "not_earmarked"
  ), cand_party) %>% 
  group_by(earmarked, cand_party) %>% 
  dplyr::summarise(n = n()) %>% 
  #always ungroup when >1 group
  ungroup() %>% 
  ggplot(aes(x = earmarked, y =n )) +
  geom_bar(stat="identity") +
  #add a y-scale since this is financial and the non-major parties received significantly less $
  scale_y_log10() +
  facet_wrap(~cand_party)

#I've heard there's a strong rural/urban funding divide -- visible in this data?
pfc.contb_by_zip <- pfc %>% 
  #only include contributions where the zip code is in CO, and the zip exists
  semi_join(colo_zip_cw, by="ZCTA_USE") %>% 
  group_by(ZCTA_USE) %>% 
  dplyr::summarise(value = sum(contb_receipt_amt), n = n())

#choropleth requires region be named region
colnames(pfc.contb_by_zip)[1] <- "region"

#use the ZCTA_USE list (Colorado zips in ZCTA format) to zoom:
zip_choropleth(pfc.contb_by_zip, title = "Donations by Zip Code", zip_zoom = colo_zip_cw$ZCTA_USE)
## Warning in self$bind(): The following regions were missing and are being
## set to NA: 80727, 81330, 81126, 81044, 81642, 81081, 80862, 80476, 80546,
## 81640, 81155, 81030, 81124, 80745, 81633, 80801, 81021, 81029, 80471,
## 81077, 81092, 80757, 81225, 80740, 81045, 81638, 81087, 80747, 80726,
## 81649, 80622, 80310, 80913, 80479, 80744, 81084, 81227, 80024, 81129,
## 81231, 81239, 81248, 80754, 80419, 80938, 81027, 80939, 80511, 80746,
## 81334, 81038, 81148, 81043, 80294, 80423

rm(pfc.contb_by_zip)
##this seems to reflect the histogram of cities showing that most of the $ came from urban

Bivariate Analysis

## # A tibble: 35 × 3
##                                                           memo_text     n
##                                                              <fctr> <int>
## 1  * EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING    23
## 2                                            REFUNDED ON 10/24/2016     9
## 3                                               REFUND TO BE ISSUED     6
## 4        * EARMARKED CONTRIBUTION: SEE BELOW REFUNDED ON 10/24/2016     3
## 5                                            REFUNDED ON 10/10/2016     3
## 6                                      REATTRIBUTION/REFUND PENDING     2
## 7                                   REFUNDED $1000.00 ON 12/29/2015     2
## 8                                            REFUNDED ON 10/18/2016     2
## 9                                             REFUNDED ON 7/12/2016     2
## 10                                     $0.83 REFUNDED ON 10/25/2016     1
## # ... with 25 more rows, and 1 more variables: meanrefund <dbl>
## # A tibble: 25 × 6
##                      cand_nm total_donation mean_donation median_donation
##                       <fctr>          <dbl>         <dbl>           <dbl>
## 1    Clinton, Hillary Rodham      8544967.4     120.61837              25
## 2           Sanders, Bernard      2298527.4      42.81189              27
## 3           Trump, Donald J.      2456085.8     179.04110              60
## 4  Cruz, Rafael Edward 'Ted'      1011921.4      78.12255              50
## 5        Carson, Benjamin S.       694383.1      97.63541              50
## 6               Rubio, Marco       632342.2     287.29768             100
## 7             Fiorina, Carly       223509.6     236.26813             100
## 8                 Paul, Rand       105243.5     160.67716              50
## 9              Johnson, Gary       128571.1     256.11771             100
## 10                 Bush, Jeb       276847.0     692.11750             250
## # ... with 15 more rows, and 2 more variables: stdev_donation <dbl>,
## #   n <int>
## # A tibble: 5 × 5
##    cand_party total_donation mean_donation median_donation      n
##        <fctr>          <dbl>         <dbl>           <dbl>  <int>
## 1    democrat    10930322.95      87.67053              25 124675
## 2  republican     5762403.22     148.70334              50  38751
## 3 libertarian      128571.09     256.11771             100    502
## 4       green       21388.27      97.66333              50    219
## 5 independent       12156.00     206.03390             100     59
## # A tibble: 616 × 5
##          contbr_city total_donation mean_donation median_donation     n
##               <fctr>          <dbl>         <dbl>           <dbl> <int>
## 1             DENVER      4190800.7     130.08849              27 32215
## 2            BOULDER      1752512.3     117.33478              27 14936
## 3   COLORADO SPRINGS      1018770.6      92.28830              30 11039
## 4       FORT COLLINS       520008.9      69.39003              25  7494
## 5          ENGLEWOOD       461519.4     192.54043              35  2397
## 6          LITTLETON       454958.3      83.31044              27  5461
## 7              ASPEN       432404.5     428.54759              40  1009
## 8             AURORA       376474.7      61.62624              25  6109
## 9  GREENWOOD VILLAGE       333703.3     324.93023              75  1027
## 10        CENTENNIAL       330993.5     100.94341              28  3279
## # ... with 606 more rows
#news stories frequently reported an age difference between party donation
donations_by_retiree <-pfc %>% 
   group_by(retiree_status, cand_party) %>% 
  dplyr::summarise(total_donation = sum(contb_receipt_amt),
                  mean_donation = mean(contb_receipt_amt),
                   median_donation = median(contb_receipt_amt),
                   n = n()) %>% 
  ungroup()

donations_by_retiree
## # A tibble: 10 × 6
##    retiree_status  cand_party total_donation mean_donation median_donation
##             <chr>      <fctr>          <dbl>         <dbl>           <dbl>
## 1     not_retired    democrat     8659734.69      88.92450            27.0
## 2     not_retired  republican     3794510.73     171.51106            50.0
## 3     not_retired libertarian      117956.59     262.12576           100.0
## 4     not_retired       green       15388.27      93.26224            50.0
## 5     not_retired independent        8288.50     184.18889           100.0
## 6         retired    democrat     2270588.26      83.19611            25.0
## 7         retired  republican     1967892.49     118.35523            50.0
## 8         retired libertarian       10614.50     204.12500           100.0
## 9         retired       green        6000.00     111.11111            50.0
## 10        retired independent        3867.50     276.25000            37.5
## # ... with 1 more variables: n <int>
## [1] 0.2
#were earmarked funds more likely to be higher amounts?
pfc %>% 
  group_by(if_else(
    grepl("EARMARK", memo_text), "Earkmarked", "Not Earmarked"
  )) %>% 
  dplyr::summarise(total_donation = sum(contb_receipt_amt),
                  mean_donation = mean(contb_receipt_amt),
                   median_donation = median(contb_receipt_amt),
                   n = n())  
## # A tibble: 2 × 5
##   `if_else(grepl("EARMARK", memo_text), ...` total_donation mean_donation
##                                        <chr>          <dbl>         <dbl>
## 1                                 Earkmarked        2262311      43.06129
## 2                              Not Earmarked       14592530     130.67665
## # ... with 2 more variables: median_donation <dbl>, n <int>
#were Republicans/Democrats more likely to earmark funds?
pfc %>% 
  #mutate to add a column for "EARMARK" in memo vs not
  mutate(earmarked = if_else(
    grepl("EARMARK", memo_text), "earmarked", "not_earmarked"
  ), cand_party) %>% 
  group_by(earmarked, cand_party) %>% 
  dplyr::summarise(n = n()) %>% 
  #always ungroup when >1 group
  ungroup() %>% 
  #only show democrats & republicans so this will be a 2x2
  filter(cand_party %in% c("democrat", "republican")) %>%
  #spread into 2x2
  tidyr::spread(earmarked, n) %>% 
  #remove explicit democrat/republican name column
  select(earmarked, not_earmarked) %>% 
  #translate into data matrix for phi friendliness
  data.matrix() %>% 
  #return two digits
  phi(digits = 2)
## [1] 0.38
#phi is 0.38, a bit further from zero, weak positive association

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

I found it really interesting that the Democratic party did fundraise successfully using smaller, more frequent, donations than the Republican party. I had read articles and heard news stories alleging this was their tactic, and it’s interesting to see how that strategy was successful in Colorado.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

I was previously unaware of how many refunds existed in election fundraising.
It was interesting, but not incredibly surprising, to see the total funding by region mirror population density.

What was the strongest relationship you found?

A weak positive association between earmarks and democrats. Democrats were more likely to use earmarks. Combining this with the summarised earmark data, these were also for lower amounts than the average donation.

Multivariate Plots Section

## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 116 rows containing missing values (geom_point).

#Were different parties getting more for retirees vs non-retirees?
ggplot(aes(x = cand_party, y=total_donation), data = filter(donations_by_retiree, cand_party %in% c("democrat", "republican"))) +
  geom_col(aes(fill = retiree_status))

## Warning in self$bind(): The following regions were missing and are being
## set to NA: 81241, 80727, 80825, 81025, 80861, 81330, 81126, 81044, 81642,
## 80131, 81653, 81081, 80818, 80427, 80434, 80862, 80476, 80729, 80830,
## 81024, 80546, 81325, 81640, 81155, 81030, 80448, 81124, 81136, 80103,
## 80611, 80650, 80728, 80745, 80749, 80750, 80742, 80137, 80743, 81335,
## 81525, 81633, 80802, 80805, 81320, 80807, 80801, 80815, 80832, 80834,
## 80864, 80459, 80451, 80928, 81021, 81029, 80471, 80483, 81058, 81062,
## 81064, 81071, 81073, 81077, 81092, 81331, 80426, 80741, 80812, 80735,
## 80757, 80804, 81225, 80649, 80740, 81045, 80475, 81638, 80755, 80836,
## 81087, 80653, 80747, 80726, 80823, 81649, 80622, 81146, 80310, 80824,
## 80913, 81041, 81057, 80479, 80833, 80744, 81084, 81253, 81227, 80024,
## 81624, 81129, 81132, 81151, 81231, 81239, 80705, 81248, 80737, 81422,
## 80754, 81434, 80810, 81646, 81648, 81655, 80419, 80914, 80938, 80468,
## 80473, 81027, 81049, 81069, 81630, 81059, 80939, 80511, 80293, 80746,
## 81140, 81610, 81334, 81038, 81148, 80624, 81431, 81043, 80294, 80423, 80623

## Warning in self$bind(): The following regions were missing and are being
## set to NA: 80727, 80652, 81330, 81126, 80733, 81044, 81642, 81081, 80862,
## 80476, 80546, 81640, 81155, 81030, 81124, 81128, 80025, 80101, 80722,
## 81426, 80745, 81633, 80801, 80822, 80420, 80455, 81021, 81029, 80469,
## 80471, 81077, 81092, 80497, 80478, 80757, 81225, 80740, 81045, 80457,
## 81138, 81638, 81087, 80747, 80726, 81649, 80622, 80310, 81332, 80913,
## 80479, 80744, 81084, 81227, 80024, 81129, 81152, 81231, 81239, 81248,
## 80721, 81411, 81433, 80754, 80419, 80938, 81027, 81076, 80939, 80511,
## 80746, 81334, 81038, 81148, 81043, 80294, 80423, 81040, 80264, 81243, 81033

## Warning in self$bind(): The following regions were missing and are being
## set to NA: 81241, 80727, 80825, 81025, 80861, 80652, 81330, 81126, 80733,
## 81044, 81642, 80131, 81653, 81081, 80818, 80427, 80434, 80862, 80476,
## 80729, 80830, 81024, 80546, 81325, 81640, 81155, 81030, 80448, 81124,
## 81128, 80025, 81136, 80101, 80103, 80611, 80650, 80722, 81426, 80728,
## 80745, 80749, 80750, 80742, 80137, 80743, 81335, 81525, 81633, 80802,
## 80805, 81320, 80807, 80801, 80815, 80832, 80834, 80864, 80822, 80420,
## 80459, 80451, 80455, 80928, 81021, 81029, 80469, 80471, 80483, 81058,
## 81062, 81064, 81071, 81073, 81077, 81092, 80497, 81331, 80426, 80741,
## 80478, 80812, 80735, 80757, 80804, 81225, 80649, 80740, 81045, 80475,
## 80457, 81138, 81638, 80755, 80836, 81087, 80653, 80747, 80726, 80823,
## 81649, 80622, 81146, 80310, 81332, 80824, 80913, 81041, 81057, 80479,
## 80833, 80744, 81084, 81253, 81227, 80024, 81624, 81129, 81132, 81151,
## 81152, 81231, 81239, 80705, 81248, 80721, 80737, 81411, 81422, 81433,
## 80754, 81434, 80810, 81646, 81648, 81655, 80419, 80914, 80938, 80468,
## 80473, 81027, 81049, 81069, 81076, 81630, 81059, 80939, 80511, 80293,
## 80746, 81140, 81610, 81334, 81038, 81148, 80624, 81431, 81043, 80294,
## 80423, 81040, 80623, 80264, 81243, 81033

Multivariate Analysis

## [1] 0.15
## [1] -0.07
## 
## Calls:
## m1: lm(formula = I(total_donation) ~ I(number_of_donations), data = frequent_contributors)
## m2: lm(formula = I(total_donation) ~ I(number_of_donations) + cand_party, 
##     data = frequent_contributors)
## 
## ==================================================================
##                                          m1             m2        
## ------------------------------------------------------------------
##   (Intercept)                        393.599***      492.426***   
##                                       (5.170)         (7.283)     
##   I(number_of_donations)              17.367***       14.862***   
##                                       (0.513)         (0.527)     
##   cand_party: republican/democrat                   -183.081***   
##                                                       (9.498)     
##   cand_party: libertarian/democrat                   -33.675      
##                                                      (53.518)     
##   cand_party: green/democrat                        -273.951**    
##                                                      (94.900)     
##   cand_party: independent/democrat                  -116.455      
##                                                     (157.603)     
## ------------------------------------------------------------------
##   R-squared                                   0.0            0.0  
##   adj. R-squared                              0.0            0.0  
##   sigma                                     866.9          862.4  
##   F                                        1145.8          306.9  
##   p                                           0.0            0.0  
##   Log-likelihood                        -291157.0      -290969.7  
##   Deviance                          26736283722.0  26456217911.4  
##   AIC                                    582320.0       581953.4  
##   BIC                                    582345.5       582012.7  
##   N                                       35577          35577    
## ==================================================================

^That is an awful model.

I’m not seeing a predictible relationship between the variables here, I suspect there’s hidden motivators not in this data set.

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

Donations mostly came from urban areas. Urban areas were also more likely to be donating Democratic. Combined, the urban areas were donating significantly more to the Democratic party.

Were there any interesting or surprising interactions between features?

It was also interesting that retirees were donating higher amounts to the Republican party. As a whole, retirees donated almost purple–the sum of democrat donations vs the sum of republican donations is very similar as a part of the total donations in Colorado.

OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.

I created a model and was completely unable to predict where someone would donate based on


Final Plots and Summary

Plot One

Description One

The majority of moneys donated in the 2016 election from Colorado residents was for Democratic candidates; 10.93 million Democratic to 5.76 million Republican.

Plot Two

Description Two

The Democrat/Republican donation split for retirees in Colorado is almost purple, barely leaning Democratic. However, the non-retiree population’s donations are financially heavily Democratic. This fits in nicely with the local wisdom that the urban areas vote Democratic, while the prarie votes Republican.

Plot Three

## Warning in self$bind(): The following regions were missing and are being
## set to NA: 81241, 80727, 80825, 81025, 80861, 80652, 81330, 81126, 80733,
## 81044, 81642, 80131, 81653, 81081, 80818, 80427, 80434, 80862, 80476,
## 80729, 80830, 81024, 80546, 81325, 81640, 81155, 81030, 80448, 81124,
## 81128, 80025, 81136, 80101, 80103, 80611, 80650, 80722, 81426, 80728,
## 80745, 80749, 80750, 80742, 80137, 80743, 81335, 81525, 81633, 80802,
## 80805, 81320, 80807, 80801, 80815, 80832, 80834, 80864, 80822, 80420,
## 80459, 80451, 80455, 80928, 81021, 81029, 80469, 80471, 80483, 81058,
## 81062, 81064, 81071, 81073, 81077, 81092, 80497, 81331, 80426, 80741,
## 80478, 80812, 80735, 80757, 80804, 81225, 80649, 80740, 81045, 80475,
## 80457, 81138, 81638, 80755, 80836, 81087, 80653, 80747, 80726, 80823,
## 81649, 80622, 81146, 80310, 81332, 80824, 80913, 81041, 81057, 80479,
## 80833, 80744, 81084, 81253, 81227, 80024, 81624, 81129, 81132, 81151,
## 81152, 81231, 81239, 80705, 81248, 80721, 80737, 81411, 81422, 81433,
## 80754, 81434, 80810, 81646, 81648, 81655, 80419, 80914, 80938, 80468,
## 80473, 81027, 81049, 81069, 81076, 81630, 81059, 80939, 80511, 80293,
## 80746, 81140, 81610, 81334, 81038, 81148, 80624, 81431, 81043, 80294,
## 80423, 81040, 80623, 80264, 81243, 81033
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=39.028453,-106.048697&zoom=6&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
## Warning: Removed 1 rows containing missing values (geom_rect).

Description Three

I overlaid the map to show how high population areas and ‘resort’ areas donated significantly more funds to Democratic candidates than Republican candidates in the 2016 election cycle. The N/A areas are areas with very low population. I’m using ZCTA instead of true Zip code because choropleth runs off ZCTA, as does census data. In this graph, the bluest areas are the areas with the highest funding difference in favor of the Democratic party.

Reflection

I validated several stories that had been trending in the news during the election cycle in this data set.

I often heard Colorado referred to as ‘purple’, leaning between Democratic and Republican, so I checked to see if the financial donations reflected that even split. This was not the case in the overall donation sums between the two parties.

In detail, I verified the news story that the Democratic funding strategy successfully raised more money by raising more frequent, smaller, donations while the Republican party went for larger, less frequent donations. On average, the Colorado donations were 50 dollars for Democrats and 100 dollars for republicans. In the violin chart, there’s a large visual difference between the large amount donations in the two parties.

I suspected the difference between the financials could be due to the differnce between older and younger contributors, as the millenial vote was strongly Democratic this election season. That was validated in the data–the split between retirees was almost even, while the split between non-retirees was rather high–8659734.69 dollars to 3794510.73 dollars.

Lastly, I’ve heard that the red/blue split is heavily influenced by geography, with Republican voters living in the prarie and Democratic voters living in the mountains and highly populated urban areas. This was validated in the data as seen in the difference between blue/Democratic fundraising and white/Republican fundraising in the choropleth map.